Graficando datos de fluorescencia
data from Synergy H1

Proyecto python

Maricela Carrera

Introducción

Lector

Gráficas generadas con el software Gen5

Tipo de datos

Time A1 A2 A3 A4 A5 A6 A7 A8 A9
0:29:10 2004 1974 1942 1808 1799 1806 2526 1899 1899
0:59:10 1794 1819 1911 1722 1675 1734 2416 1738 1738
1:29:10 1845 1902 1871 1738 1822 1655 2354 1758 1758
1:59:10 1905 2057 1911 1691 1955 1805 2462 1831 1831
2:29:10 1900 2018 2080 1852 2046 1813 2532 2038 2038
2:59:10 1980 2114 2071 1814 1996 1980 2555 1871 1871
3:29:10 2018 1954 2022 1872 2311 1956 2592 1869 1869
3:59:10 1993 2064 2183 1737 2303 1910 2705 2056 2056
4:29:10 1966 2142 1994 1887 2512 1893 2772 2032 2032
4:59:10 1934 2036 2028 1925 2608 1914 2811 1976 1976
5:29:10 1934 1956 2051 1733 2582 1824 2727 1967 1967
5:59:10 1985 1961 1987 1801 2768 1806 2914 1822 1822
6:29:10 1940 1955 1832 1648 2848 1669 2949 1818 1818
6:59:10 1857 1880 2016 1643 3010 1632 2869 1808 1808
7:29:10 1725 1760 1839 1573 3041 1623 2877 1738 1738
7:59:10 1738 1784 1696 1440 3095 1555 2924 1589 1589
8:29:10 1723 1748 1729 1479 3407 1465 2983 1598 1598
8:59:10 1485 1612 1524 1389 3330 1327 2980 1497 1497
9:29:10 1447 1382 1648 1295 3451 1221 2874 1463 1463
9:59:10 1457 1391 1485 1162 3749 1142 3085 1320 1320
10:29:10 1370 1464 1418 1152 3828 1123 3104 1302 1302

Carga de la muestra

Problema

En el laboratorio se generan datos derivados de la lectura de actividad enzimática medida por fluorescencia, estos datos se tienen que reordenar para después realizar análisis estadísticos y finalmente generar gráficas como resultado final para el análisis de sus datos

Samples Desordenadas

Resolver el problema de carga

Estandarizar datos de entrada:

Figure 1: Samples A, B, C with 3 rep

Se ordena la matriz de entrada de acuerdo a las réplicas y las muestras.

Programa

import argparse
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def plot_time_vs_average(file_path, rep, sep, output_graph, sample_names):
      """
        Plot Time vs Average per Sample of the measurements 
        Plots the results of fluorescence measurements of liquid samples from 96-well plates (may be any desired matrix).
        This script reads a CSV file containing time series data and plots the average values per sample over time.

        Usage:
            python plot_data_96plate.py <file> [--rep <step_size>] [--sep <delimiter>] [--output <output_file>] [--sample-names <names>...]

        Arguments:
            file              : Path to the input CSV file. It must contain the report of each of the samples pre-ordered like the example data
            --rep <number of replicates> : Number of replicates for sample, in the code it names "Step size for sub-coordinates" (default: 3).
            --sep <delimiter> : Delimiter used in the input file (default: ',').
            --output <output_file> : File name to save the output graph.
            --sample-names <names> : Custom names for the samples in the plot.

        Example:
            python plot_script.py data.csv --rep 3 --sep ';' --output graph.png --sample-names Blank Mutant_01 Mutant_02
            
        Example Data:
                        A1    A2    A3    A4    A5    A6    A7    A8    A9
                    Time                                                          
                    0:29:10   2004  1974  1942  1808  1799  1806  2526  1899  1899
                    0:59:10   1794  1819  1911  1722  1675  1734  2416  1738  1738
                    1:29:10   1845  1902  1871  1738  1822  1655  2354  1758  1758 
                    ...

    """
    # Read the file and retrieve the column names
    df = pd.read_csv(file_path, index_col=0, sep=sep)

    coordinates = df.columns.tolist()

    sub_coordinates_dict = {}

    for i in range(0, len(coordinates)-rep+1, rep):
        sub_coordinates = coordinates[i:i+rep]
        key = f'Sample_{i//rep + 1}'  # Generate unique key
        sub_coordinates_dict[key] = sub_coordinates

    # Calculate average per sample on each time
    averages_per_time = {}
    for time, row in df.iterrows():
        for key, sub_coordinates in sub_coordinates_dict.items():
            sample_values = row[sub_coordinates]
            average = sample_values.mean()
            if time in averages_per_time:
                averages_per_time[time][key] = average
            else:
                averages_per_time[time] = {key: average}

    # Prepare data for plotting
    data = []
    for time, averages in averages_per_time.items():
        for sample, average in averages.items():
            data.append({'Time': time, 'Sample': sample, 'Average': average})

    # Convert data to DataFrame
    df_plot = pd.DataFrame(data)

    # Customize sample names
    if sample_names:
        sample_names_dict = dict(zip(sub_coordinates_dict.keys(), sample_names))
        df_plot['Sample'] = df_plot['Sample'].replace(sample_names_dict)

    # Plotting
    sns.scatterplot(data=df_plot, x='Time', y='Average', hue='Sample')
    plt.xticks(rotation=90)

    # Export the plot
    if output_graph:
        plt.savefig(output_graph)

    plt.show()

if __name__ == '__main__':
    # Create argument parser
    parser = argparse.ArgumentParser(description='Plot time vs average per sample.')

    # Add arguments
    parser.add_argument('file', type=argparse.FileType('r'), help='Input file path')
    parser.add_argument('--rep', type=int, default=3, help='Step size for sub-coordinates for the sample (default: 3)')
    parser.add_argument('--sep', type=str, default=',', help='Delimiter for input file (default: ",")')
    parser.add_argument('--output', type=str, help='Output graph file name')
    parser.add_argument('--sample-names', type=str, nargs='+', help='Customize sample names')

    # Parse arguments
    args = parser.parse_args()

    # Call the plot function with provided arguments
    plot_time_vs_average(args.file.name, args.rep, args.sep, args.output, args.sample_names)

Dependecias

Dependencies

The code requires the following dependencies:

  • pandas (version 1.3.3 or above)
  • seaborn (version 0.11.2 or above)
  • matplotlib (version 3.4.3 or above)
  • module:
  • argparse
  • Sys
  • and python3.11 or 3

Manual

El cómo:

python plot_script.py data.csv --rep 3 --sep ';' --output graph.png --sample-names Blank Mutant_01 Mutant_02

Manual

usage:

Usage:
            python plot_data_96plate.py <file> [--rep <step_size>] [--sep <delimiter>] [--output <output_file>] [--sample-names <names>...]

Arguments:

Arguments:
            file              : Path to the input CSV file. It must contain the report of each of the samples pre-ordered like the example data
            --rep <number of replicates> : Number of replicates for sample, in the code it names "Step size for sub-coordinates" (default: 3).
            --sep <delimiter> : Delimiter used in the input file (default: ',').
            --output <output_file> : File name to save the output graph.
            --sample-names <names> : Custom names for the samples in the plot.

Example:

Example:
            python plot_script.py data.csv --rep 3 --sep ';' --output graph.png --sample-names Blank Mutant_01 Mutant_02

Tomar en cuenta:

  1. Datos ordenados
  2. Número de columnas
  3. Solo calcula el promedio de las réplicas y grafica

Lectura del código:

import argparse
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def plot_time_vs_average(file_path, rep, sep, output_graph, sample_names):
      """
        Plot Time vs Average per Sample of the measurements 
        Plots the results of fluorescence measurements of liquid samples from 96-well plates (may be any desired matrix).
        This script reads a CSV file containing time series data and plots the average values per sample over time.

        Usage:
            python plot_data_96plate.py <file> [--rep <step_size>] [--sep <delimiter>] [--output <output_file>] [--sample-names <names>...]

        Arguments:
            file              : Path to the input CSV file. It must contain the report of each of the samples pre-ordered like the example data
            --rep <number of replicates> : Number of replicates for sample, in the code it names "Step size for sub-coordinates" (default: 3).
            --sep <delimiter> : Delimiter used in the input file (default: ',').
            --output <output_file> : File name to save the output graph.
            --sample-names <names> : Custom names for the samples in the plot.

        Example:
            python plot_script.py data.csv --rep 3 --sep ';' --output graph.png --sample-names Blank Mutant_01 Mutant_02
            
        Example Data:
                        A1    A2    A3    A4    A5    A6    A7    A8    A9
                    Time                                                          
                    0:29:10   2004  1974  1942  1808  1799  1806  2526  1899  1899
                    0:59:10   1794  1819  1911  1722  1675  1734  2416  1738  1738
                    1:29:10   1845  1902  1871  1738  1822  1655  2354  1758  1758 
                    ...

    """
    # Read the file and retrieve the column names
    df = pd.read_csv(file_path, index_col=0, sep=sep)

    coordinates = df.columns.tolist()

    sub_coordinates_dict = {}

    for i in range(0, len(coordinates)-rep+1, rep):
        sub_coordinates = coordinates[i:i+rep]
        key = f'Sample_{i//rep + 1}'  # Generate unique key
        sub_coordinates_dict[key] = sub_coordinates

    # Calculate average per sample on each time
    averages_per_time = {}
    for time, row in df.iterrows():
        for key, sub_coordinates in sub_coordinates_dict.items():
            sample_values = row[sub_coordinates]
            average = sample_values.mean()
            if time in averages_per_time:
                averages_per_time[time][key] = average
            else:
                averages_per_time[time] = {key: average}

    # Prepare data for plotting
    data = []
    for time, averages in averages_per_time.items():
        for sample, average in averages.items():
            data.append({'Time': time, 'Sample': sample, 'Average': average})

    # Convert data to DataFrame
    df_plot = pd.DataFrame(data)

    # Customize sample names
    if sample_names:
        sample_names_dict = dict(zip(sub_coordinates_dict.keys(), sample_names))
        df_plot['Sample'] = df_plot['Sample'].replace(sample_names_dict)

    # Plotting
    sns.scatterplot(data=df_plot, x='Time', y='Average', hue='Sample')
    plt.xticks(rotation=90)

    # Export the plot
    if output_graph:
        plt.savefig(output_graph)

    plt.show()

if __name__ == '__main__':
    # Create argument parser
    parser = argparse.ArgumentParser(description='Plot time vs average per sample.')

    # Add arguments
    parser.add_argument('file', type=argparse.FileType('r'), help='Input file path')
    parser.add_argument('--rep', type=int, default=3, help='Step size for sub-coordinates for the sample (default: 3)')
    parser.add_argument('--sep', type=str, default=',', help='Delimiter for input file (default: ",")')
    parser.add_argument('--output', type=str, help='Output graph file name')
    parser.add_argument('--sample-names', type=str, nargs='+', help='Customize sample names')

    # Parse arguments
    args = parser.parse_args()

    # Call the plot function with provided arguments
    plot_time_vs_average(args.file.name, args.rep, args.sep, args.output, args.sample_names)

Resultados

Dos tipos de gráficas

–sample-names personalizados

Sin personalizar